Dynamic rules engine in a cloud-based sandbox

ABSTRACT

Computer-implemented systems and methods include receiving unknown content in a cloud-based sandbox; performing an analysis of the unknown content in the cloud-based sandbox, to obtain a score to determine whether or not the unknown content is malware; obtaining events based on the analysis; running one or more rules on the events; and adjusting the score based on a result of the one or more. The systems and methods can include classifying the unknown content as malware or clean based on the adjusted score. The analysis can include a static analysis and a dynamic analysis, with the events generated based thereon.

FIELD OF THE DISCLOSURE

The present disclosure relates generally to computer networking systemsand methods. More particularly, the present disclosure relates tosystems and methods for cloud-based malware behavior analysis via adynamic rules engine in a cloud-based sandbox.

BACKGROUND OF THE DISCLOSURE

Malware, short for malicious software, is software used to disruptcomputer operation, gather sensitive information, and/or gain access toprivate computer systems. It can appear in the form of code, scripts,active content, and other software. ‘Malware’ is a general term used torefer to a variety of forms of hostile or intrusive software. Malwareincludes, for example, computer viruses, ransomware, worms, Trojanhorses, rootkits, key loggers, dialers, spyware, adware, maliciousBrowser Helper Objects (BHOs), rogue security software, and othermalicious programs; the majority of active malware threats are usuallyworms or Trojans rather than viruses. As is widely known, there is aneed for security measures to protect against malware and the like.Specifically, there is a need for zero-day/zero-hour protection againsta rapidly morphing threat landscape. Security processing is moving tothe Cloud including malware detection. For example, cloud-based malwareprotection is described in commonly-assigned U.S. Pat. Nos. 9,152,789and 9,609,015, each entitled “Systems and methods for dynamiccloud-based malware behavior analysis,” the contents of each areincorporated herein by reference.

With cloud-based malware protection, there needs to be a way to quicklydetect malware and pass this detection on to provide zero-day/zero-hourprotection. There are also needs to improve the efficacy of malwaredetection, provide malware attribution, improve scoring in malwaredetection, etc.

BRIEF SUMMARY OF THE DISCLOSURE

The present disclosure relates to systems and methods for cloud-basedmalware behavior analysis via a dynamic rules engine in a cloud-basedsandbox. Computer-implemented systems and methods include receivingunknown content in a cloud-based sandbox; performing an analysis of theunknown content in the cloud-based sandbox, to obtain a score todetermine whether or not the unknown content is malware; obtainingevents based on the analysis; running one or more rules on the events;and adjusting the score based on a result of the one or more. Thesystems and methods can include classifying the unknown content asmalware or clean based on the adjusted score. The analysis can include astatic analysis and a dynamic analysis, with the events generated basedthereon.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated and described herein withreference to the various drawings, in which like reference numbers areused to denote like system components/method steps, as appropriate, andin which:

FIG. 1 is a network diagram of a cloud-based system for implementingvarious cloud-based service functions including a sandbox;

FIG. 2 is a block diagram of a server which may be used in thecloud-based system of FIG. 1 or the like;

FIG. 3 is a block diagram of a mobile device which may be used in thecloud-based system of FIG. 1 or the like;

FIG. 4 is a flowchart of a behavioral analysis method in the cloud;

FIG. 5 is a block diagram of an example implementation of a BehavioralAnalysis (BA) system for use with the cloud-based system or any othercloud-based system;

FIGS. 6-8 are flowcharts of example operational methods associated withthe BA system of FIG. 5 including methods performed by the server in thecloud components (FIG. 6 ), the server in the sandbox components (FIG. 7), and the BA controller (FIG. 8 );

FIG. 9 is a screenshot of a Dynamic YARA rule name and contextualinformation in a BA report; and

FIG. 10 is a flowchart of a process for dynamic rules in a cloud-basedsandbox.

DETAILED DESCRIPTION OF THE DISCLOSURE

Again, the present disclosure relates to systems and methods forcloud-based malware behavior analysis via a dynamic rules engine in acloud-based sandbox. The systems and methods leverage a distributed,cloud-based security system to sandbox unknown content in the cloud, toinstall the unknown content for observation and analysis, and toleverage the results in the cloud for near immediate protection fromnewly detected malware. Computer-implemented systems and methods includereceiving unknown content in a cloud-based sandbox; performing ananalysis of the unknown content in the cloud-based sandbox, to obtain ascore to determine whether or not the unknown content is malware;obtaining events based on the analysis; running one or more rules on theevents; and adjusting the score based on a result of the one or more.The systems and methods can include classifying the unknown content asmalware or clean based on the adjusted score. The analysis can include astatic analysis and a dynamic analysis, with the events generated basedthereon.

Example Cloud System Architecture

FIG. 1 is a network diagram of a cloud-based system 100 for implementingvarious cloud-based service functions including a sandbox 101. Thecloud-based system 100 includes one or more cloud nodes (CN) 102communicatively coupled to the Internet 104 or the like. The cloud nodes102 may be implemented as a server 200 (as illustrated in FIG. 2 ) orthe like, and can be geographically diverse from one another such aslocated at various data centers around the country or globe. Forillustration purposes, the cloud-based system 100 can include a regionaloffice 110, headquarters 120, various employee's homes 130,laptops/desktops 140, and mobile devices 150 each of which can becommunicatively coupled to one of the cloud nodes 102. These locations110, 120, 130 and devices 140, 150 are shown for illustrative purposes,and those skilled in the art will recognize there are various accessscenarios to the cloud-based system 100 all of which are contemplatedherein.

Again, the cloud-based system 100 can provide any functionality throughservices such as Software as a Service (SaaS), Platform as a Service(PaaS), Infrastructure as a Service (IaaS), Security as a Service,Virtual Network Functions (VNFs) in a Network Functions Virtualization(NFV) Infrastructure (NFVI), etc. to the locations 110, 120, 130 anddevices 140, 150. The cloud-based system 100 is replacing theconventional deployment model where network devices are physicallymanaged and cabled together in sequence to deliver the various servicesassociated with the network devices. The cloud-based system 100 can beused to implement these services in the cloud without end-usersrequiring the physical devices and management thereof. The cloud-basedsystem 100 can provide services via VNFs (e.g., firewalls, Deep PacketInspection (DPI), Network Address Translation (NAT), etc.). VNFs takethe responsibility of handling specific network functions that run onone or more virtual machines (VMs), software containers, etc., on top ofthe hardware networking infrastructure—routers, switches, etc.Individual VNFs can be connected or combined together as building blocksin a service chain to offer a full-scale networking communicationservice.

Two example services include Zscaler Internet Access (ZIA) (which cangenerally be referred to as Internet Access (IA)) and Zscaler PrivateAccess (ZPA) (which can generally be referred to as Private Access(PA)), from Zscaler, Inc. (the assignee and applicant of the presentapplication). The IA service can include firewall, threat prevention,DPI, Data Leakage Prevention (DLP), and the like. The PA can includeaccess control, microservice segmentation, etc. For example, the IAservice can provide a user with Internet Access, and the PA service canprovide a user with access to enterprise resources in lieu oftraditional Virtual Private Networks (VPNs).

Cloud computing systems and methods abstract away physical servers,storage, networking, etc. and instead offer these as on-demand andelastic resources. The National Institute of Standards and Technology(NIST) provides a concise and specific definition which states cloudcomputing is a model for enabling convenient, on-demand network accessto a shared pool of configurable computing resources (e.g., networks,servers, storage, applications, and services) that can be rapidlyprovisioned and released with minimal management effort or serviceprovider interaction. Cloud computing differs from the classicclient-server model by providing applications from a server that areexecuted and managed by a client's web browser or the like, with noinstalled client version of an application required. Centralizationgives cloud service providers complete control over the versions of thebrowser-based and other applications provided to clients, which removesthe need for version upgrades or license management on individual clientcomputing devices. The phrase SaaS is sometimes used to describeapplication programs offered through cloud computing. A common shorthandfor a provided cloud computing service (or even an aggregation of allexisting cloud services) is “the cloud.” The cloud-based system 100 isillustrated herein as one example embodiment of a cloud-based system,and those of ordinary skill in the art will recognize the systems andmethods described herein contemplate operation with any cloud-basedsystem.

In an embodiment, the cloud-based system 100 can be a distributedsecurity system or the like. Here, in the cloud-based system 100,traffic from various locations (and various devices located therein)such as the regional office 110, the headquarters 120, variousemployee's homes 130, laptops/desktops 140, and mobile devices 150 canbe monitored (e.g., inline) or redirected to the cloud through the cloudnodes 102. That is, each of the locations 110, 120, 130, 140, 150 iscommunicatively coupled to the Internet 104 and can be monitored by thecloud nodes 102. The cloud-based system 100 may be configured to performvarious functions such as spam filtering, Uniform Resource Locator (URL)filtering, antivirus protection, bandwidth control, DLP, zero-dayvulnerability protection, web 2.0 features, and the like. In anembodiment, the cloud-based system 100 may be viewed asSecurity-as-a-Service through the cloud, such as the IA. For example,the cloud-based system 100 can be used to block or allow access to websites, files, streaming services, etc. Such access control can be basedin part on the systems and methods described herein to identify malwarethrough sandboxing.

Advantageously, the cloud-based system 100, when operating as adistributed security system, avoids platform-specific security apps onthe mobile devices 150, forwards web traffic through the cloud-basedsystem 100, enables network administrators to define policies in thecloud, and enforces/cleans traffic in the cloud prior to delivery to themobile devices 150. Further, through the cloud-based system 100, networkadministrators may define user-centric policies tied to users, notdevices, with the policies being applied regardless of the device usedby the user. The cloud-based system 100 provides 24×7 security with noneed for updates as the cloud-based system 100 is always up to date withcurrent threats and without requiring device signature updates. Also,the cloud-based system 100 enables multiple enforcement points,centralized provisioning, and logging, automatic traffic routing to thenearest cloud node 102, the geographical distribution of the cloud nodes102, policy shadowing of users which is dynamically available at thecloud nodes 102, etc.

In an embodiment, each of the cloud nodes 102 may include a decisionsystem, e.g., data inspection engines that operate on a content item,e.g., a Web page, a file, an email message, or some other data or datacommunication that is sent from or requested by a user device 300. In anembodiment, all data destined for or received from the Internet 104 isprocessed through one of the cloud nodes 102. In another embodiment,specific data specified by policy, e.g., only email, only executablefiles, etc., is processed through one of the cloud nodes 102.

Each of the cloud nodes 102 may generate a decision vector D=[d1, d2, .. . , dn] for a content item of one or more parts C=[c1, c2, . . . ,cm]. Each decision vector may identify a threat classification, e.g.,clean, spyware, malware, undesirable content, innocuous, spam email,unknown, etc. For example, the output of each element of the decisionvector D may be based on the output of one or more data inspectionengines. In an embodiment, the threat classification may be reduced to asubset of categories, e.g., violating, non-violating, neutral, unknown.Based on the subset classification, the cloud node 102 may allow thedistribution of the content item, preclude distribution of the contentitem, allow distribution of the content item after a cleaning process,or perform threat detection on the content item. In an embodiment, theactions taken by one of the cloud nodes 102 may be determinative on thethreat classification of the content item and on a security policy ofthe external system to which the content item is being sent from or fromwhich the content item is being requested by. A content item isviolating if, for any part C=[c1, c2, . . . , cm] of the content item,at any of the cloud nodes 102, any one of the data inspection enginesgenerates an output that results in a classification of “violating.”

In an embodiment, one or more of the cloud nodes 102 can be a CentralAuthority (CA) node 102A that communicates with the other cloud nodes102. The CA nodes 102A may store policy data for each user and maydistribute the policy data to each of the cloud nodes 102. The policymay, for example, define security policies for a protected system, e.g.,security policies for an enterprise. Example policy data may defineaccess privileges for users, web sites, and/or content that isdisallowed, restricted domains, etc. The CA nodes 102A may distributethe policy data to the cloud nodes 102. In an embodiment, the CA nodes102A may also distribute threat data that includes the classificationsof content items according to threat classifications, e.g., a list ofknown viruses, a list of known malware sites, spam email domains, a listof known phishing sites, known malware content, etc. The distribution ofthreat data between the CA nodes 102A and the cloud nodes 102 may beimplemented by a push and pull distribution schemes described in moredetail below. In an embodiment, the CA nodes 102A can continually updatethe cloud nodes 102 with newly detected malware as described hereinthrough the sandbox 101 for zero-day/zero-hour protection.

Example Server Architecture

FIG. 2 is a block diagram of a server 200 which may be used in thecloud-based system 100, in other systems, or standalone. For example,the cloud nodes 102 may be formed as one or more of the servers 200. Theserver 200 may be a digital computer that, in terms of hardwarearchitecture, generally includes a processor 202, Input-Output (I/O)interfaces 204, a network interface 206, a data store 208, and memory210. It should be appreciated by those of ordinary skill in the art thatFIG. 2 depicts the server 200 in an oversimplified manner, and apractical embodiment may include additional components and suitablyconfigured processing logic to support known or conventional operatingfeatures that are not described in detail herein. The components (202,204, 206, 208, and 210) are communicatively coupled via a localinterface 212. The local interface 212 may be, for example, but notlimited to, one or more buses or other wired or wireless connections, asis known in the art. The local interface 212 may have additionalelements, which are omitted for simplicity, such as controllers, buffers(caches), drivers, repeaters, and receivers, among many others, toenable communications. Further, the local interface 212 may includeaddress, control, and/or data connections to enable appropriatecommunications among the aforementioned components.

The processor 202 is a hardware device for executing softwareinstructions. The processor 202 may be any custom made or commerciallyavailable processor, a central processing unit (CPU), an auxiliaryprocessor among several processors associated with the server 200, asemiconductor-based microprocessor (in the form of a microchip orchipset), or generally any device for executing software instructions.When the server 200 is in operation, the processor 202 is configured toexecute software stored within the memory 210, to communicate data toand from the memory 210, and to generally control operations of theserver 200 pursuant to the software instructions. The I/O interfaces 204may be used to receive user input from and/or for providing systemoutput to one or more devices or components.

The network interface 206 may be used to enable the server 200 tocommunicate on a network, such as the Internet 104. The networkinterface 206 may include, for example, an Ethernet card or adapter(e.g., 10BaseT, Fast Ethernet, Gigabit Ethernet, 10GbE) or a WirelessLocal Area Network (WLAN) card or adapter (e.g., 802.11a/b/g/n/ac). Thenetwork interface 206 may include address, control, and/or dataconnections to enable appropriate communications on the network. A datastore 208 may be used to store data. The data store 208 may include anyof volatile memory elements (e.g., random access memory (RAM, such asDRAM, SRAM, SDRAM, and the like)), nonvolatile memory elements (e.g.,ROM, hard drive, tape, CDROM, and the like), and combinations thereof.Moreover, the data store 208 may incorporate electronic, magnetic,optical, and/or other types of storage media. In one example, the datastore 208 may be located internal to the server 200 such as, forexample, an internal hard drive connected to the local interface 212 inthe server 200. Additionally, in another embodiment, the data store 208may be located external to the server 200 such as, for example, anexternal hard drive connected to the I/O interfaces 204 (e.g., SCSI orUSB connection). In a further embodiment, the data store 208 may beconnected to the server 200 through a network, such as, for example, anetwork-attached file server.

The memory 210 may include any of volatile memory elements (e.g., randomaccess memory (RAM, such as DRAM, SRAM, SDRAM, etc.)), nonvolatilememory elements (e.g., ROM, hard drive, tape, CDROM, etc.), andcombinations thereof. Moreover, the memory 210 may incorporateelectronic, magnetic, optical, and/or other types of storage media. Notethat the memory 210 may have a distributed architecture, where variouscomponents are situated remotely from one another, but can be accessedby the processor 202. The software in memory 210 may include one or moresoftware programs, each of which includes an ordered listing ofexecutable instructions for implementing logical functions. The softwarein the memory 210 includes a suitable Operating System (O/S) 214 and oneor more programs 216. The operating system 214 essentially controls theexecution of other computer programs, such as the one or more programs216, and provides scheduling, input-output control, file and datamanagement, memory management, and communication control and relatedservices. The one or more programs 216 may be configured to implementthe various processes, algorithms, methods, techniques, etc. describedherein.

Example User Device Architecture

FIG. 3 is a block diagram of a user device 300, which may be used in thecloud-based system 100 or the like. Again, the user device 300 can be asmartphone, a tablet, a smartwatch, an Internet of Things (IoT) device,a laptop, etc. The user device 300 can be a digital device that, interms of hardware architecture, generally includes a processor 302, I/Ointerfaces 304, a radio 306, a data store 308, and memory 310. It shouldbe appreciated by those of ordinary skill in the art that FIG. 3 depictsthe user device 300 in an oversimplified manner, and a practicalembodiment may include additional components and suitably configuredprocessing logic to support known or conventional operating featuresthat are not described in detail herein. The components (302, 304, 306,308, and 302) are communicatively coupled via a local interface 312. Thelocal interface 312 can be, for example, but not limited to, one or morebuses or other wired or wireless connections, as is known in the art.The local interface 312 can have additional elements, which are omittedfor simplicity, such as controllers, buffers (caches), drivers,repeaters, and receivers, among many others, to enable communications.Further, the local interface 312 may include address, control, and/ordata connections to enable appropriate communications among theaforementioned components.

The processor 302 is a hardware device for executing softwareinstructions. The processor 302 can be any custom made or commerciallyavailable processor, a central processing unit (CPU), an auxiliaryprocessor among several processors associated with the user device 300,a semiconductor-based microprocessor (in the form of a microchip orchipset), or generally any device for executing software instructions.When the user device 300 is in operation, the processor 302 isconfigured to execute software stored within the memory 310, tocommunicate data to and from the memory 310, and to generally controloperations of the user device 300 pursuant to the software instructions.In an embodiment, the processor 302 may include a mobile-optimizedprocessor such as optimized for power consumption and mobileapplications. The I/O interfaces 304 can be used to receive user inputfrom and/or for providing system output. User input can be provided via,for example, a keypad, a touch screen, a scroll ball, a scroll bar,buttons, barcode scanner, and the like. System output can be providedvia a display device such as a Liquid Crystal Display (LCD), touchscreen, and the like.

The radio 306 enables wireless communication to an external accessdevice or network. Any number of suitable wireless data communicationprotocols, techniques, or methodologies can be supported by the radio306, including any protocols for wireless communication. The data store308 may be used to store data. The data store 308 may include any ofvolatile memory elements (e.g., random access memory (RAM, such as DRAM,SRAM, SDRAM, and the like)), nonvolatile memory elements (e.g., ROM,hard drive, tape, CDROM, and the like), and combinations thereof.Moreover, the data store 308 may incorporate electronic, magnetic,optical, and/or other types of storage media.

The memory 310 may include any of volatile memory elements (e.g., randomaccess memory (RAM, such as DRAM, SRAM, SDRAM, etc.)), nonvolatilememory elements (e.g., ROM, hard drive, etc.), and combinations thereof.Moreover, the memory 310 may incorporate electronic, magnetic, optical,and/or other types of storage media. Note that the memory 310 may have adistributed architecture, where various components are situated remotelyfrom one another but can be accessed by the processor 302. The softwarein memory 310 can include one or more software programs, each of whichincludes an ordered listing of executable instructions for implementinglogical functions. In the example of FIG. 3 , the software in the memory310 includes a suitable Operating System (O/S) 314 and programs 316. Theoperating system 314 essentially controls the execution of othercomputer programs, and provides scheduling, input-output control, fileand data management, memory management, and communication control andrelated services. The programs 316 may include various applications,add-ons, etc. configured to provide end-user functionality with the userdevice 300. For example, example programs 316 may include, but notlimited to, a web browser, social networking applications, streamingmedia applications, games, mapping and location applications, electronicmail applications, financial applications, and the like. In a typicalexample, the end-user typically uses one or more of the programs 316along with a network such as the cloud-based system 100.

Cloud-Based Sandboxing

FIG. 4 is a flowchart of a behavioral analysis method 650 in the cloud.The behavioral analysis method 650 can be implemented through the BAsystem 600 with any cloud-based system. The cloud-based method 650includes receiving known malware signatures at one or more nodes in acloud-based system (step 652). The cloud-based method 650 includesmonitoring one or more users inline through the one or more nodes in thecloud-based system for regular traffic processing comprising malwaredetection and preclusion (step 654). Note, the cloud-based system canalso monitor for other security aspects (e.g., viruses, spyware, dataleakage, policy enforcement, etc.). The cloud-based method 650 includesdetermining unknown content from a user of the one or more users issuspicious of being malware (step 656). The cloud-based method 650includes sending the unknown content to a behavioral analysis system foran offline analysis (step 658). Finally, the cloud-based method 650includes receiving updated known malware signatures based on the offlineanalysis (step 660).

The cloud-based method 650 can include performing one of blocking orallowing the unknown content to or from the user based on policy. Theone or more users can include a plurality of users associated with aplurality of companies, and the cloud-based method 650 can furtherinclude receiving a policy setting for each of the plurality ofcompanies, wherein the policy setting comprises whether or not toperform the offline analysis for the unknown content; and performing theregular traffic processing for the unknown content for users associatedwith companies with the policy setting of not performing the offlineanalysis, wherein the regular traffic processing comprises monitoringfor malware based on the offline analysis of other users. Thecloud-based method 650 can include determining unknown content issuspicious based on an analysis in the one or more nodes based on smartfiltering determining that the unknown content is an unknown, activesoftware file that performs some functionality on the user's device. Thecloud-based method 650 can include storing the unknown content in thebehavioral analysis system and maintaining an event log associated withthe unknown content in the behavioral analysis system; and performingthe offline analysis on the unknown content comprising a static analysisand a dynamic analysis. The unknown content can be stored in anencrypted format, and the cloud-based method 650 can include storingresults data from various stages of the offline analysis of the unknowncontent, wherein the results data includes static analysis results,JavaScript Object Notation (JSON) data from the dynamic analysis, packetcapture data, screenshot images, and files created/deleted/downloadedduring the dynamic analysis.

The static analysis can evaluate various properties of the unknowncontent, and the dynamic analysis runs the unknown content on a virtualmachine operating an appropriate operating system for the unknowncontent. The cloud-based method 650 can include performing the offlineanalysis as a combination of a static analysis and a dynamic analysis bythe behavioral analysis system. The static analysis can evaluate variousproperties of the unknown content using a set of tools based on a typeof file of the unknown content, wherein the set of tools comprise any ofchecking third party services to match the unknown content to knownviruses detected by various anti-virus engines, using a Perl CompatibleRegular Expressions (PCRE) engine to check the unknown content for knownsignatures, identifying code signing certificates to form a whitelist ofknown benign content using Portable Executable (PE)/Common Object FileFormat (COFF) specifications, and evaluating destinations of anycommunications from the dynamic analysis. The dynamic analysis can runthe unknown content on a virtual machine operating an appropriateoperating system for the unknown content and evaluates any of JavaScriptObject Notation (JSON) data generated; temporary files generated, systemand registry files modified; files added or deleted; processor, network,memory and file system usages; external communications; security bypass;data leakage; and persistence.

Sandbox System

FIG. 5 is a block diagram of an example implementation of a BehavioralAnalysis (BA) system 700 for use with the cloud-based system 100 or anyother cloud-based system. FIG. 5 is presented as an exampleimplementation for the sandbox 101, and those of ordinary skill in theart will appreciate other implementations providing similarfunctionality are also contemplated. The BA system 700 can include cloudcomponents 702 and sandbox components 704. The cloud components 702 caninclude the cloud nodes 102, etc. The cloud components 702 are generallyused to monitor users in the cloud, to detect known malware, to provideunknown files that could be malware to the sandbox components 704, andto receive updates to known malware from the sandbox components 704. Thesandbox components 704 are generally configured to receive unknown filesand determine whether they are malicious (malware) or benign and providethis information to the cloud components 702. The sandbox components 704can perform a static analysis and a dynamic analysis of the unknownfiles in an offline manner whereas the cloud components 702 areconfigured to detect malware inline. As described herein, the sandboxcomponents 704 can also be referred to as BA infrastructure.

The cloud components 702 can include a server 710 (or plurality ofservers 710), a data store 712, and a user interface (UI) 714. Theserver 710 can include the processing nodes 110, the cloud nodes 502,etc. and the server 710 is generally the initiator and final consumer ofresults from the BA system 700, i.e. the server 710 inter alia detectsand precludes malware as well as flagging unknown files for BA analysisby the BA system 700. The data store 712 can be a storage mechanism forall transaction logs and reporting mechanisms. The UI 714 can providethe ability to configure BA policies as well as turning it on/off at acompany level. It is also the gateway to all reports and forensicanalysis. The sandbox components 704 can include a server 720, a BAcontroller 722, a BAUI 724, and a Virtual Machine (VM) server 726. Theserver 720 provides a gateway to the BA infrastructure in the sandboxcomponents 704 and acts a consolidated secure (encrypted) storage serverfor BA content. The BA controller 722 provides sandboxing functionalityfor performing dynamic analysis of BA content. The BAUI 724 provides auser interface to view the analysis results of BA content. Finally, theVM server 726 provides a VM infrastructure used by the BA controller 722for dynamic analysis of BA content. Note, the cloud components 702 andthe sandbox components 704, as described herein, can be a combination ofhardware, software, and/or firmware for performing the variousfunctionality described herein. FIGS. 6-8 are flowcharts of exampleoperational methods 800, 802, 804 performed by the server 710 (FIG. 6 ),the server 720 (FIG. 7 ), and the BA controller 722 (FIG. 8 ).

Variously, the sandbox components 704 are configured to distribute knownmalware signatures to the cloud components 702, e.g., the distributedcloud enforcement nodes. The cloud components 702 monitor inline userssuch as using HTTP and non-HTTP protocols (to cover proxy andfirewall/DPI) to detect and block/preclude malware. In addition, thecloud components 702 perform intelligent collection of unknown malwarefrom distributed cloud enforcement nodes. The enforcement nodes decidewhat is unknown malware—smart filtering based on signatures andstatic/dynamic analysis criteria that can be performed quickly inlineand send it securely and efficiently to BA Analysis engine in the cloud,i.e. the sandbox components 704. The sandbox components 704 is a BAAnalysis Engine which includes secure content storage with data destructcapabilities, is a scalable and flexible platform for VM based executionsandboxes, includes a smart scheduler to determine what needs to beanalyzed and manage BA content from the cloud, and includes threatreporting storage and UI infrastructure for malware result analysis andresearch. The sandbox components 704 can provide dynamic updates basedon latest malware analysis thereby providing zero-day/zero-hourprotection.

FIG. 6 illustrates an operational method 800 performed by the cloudcomponents 702, such as the server 710. The server 710 is the initiatorfor the BA logic sequence. Generally, the server 710 is configured toprocess policy information related to BA, and this can be managed withflags to enable/disable the feature at the company level. The server 710is further configured to consume signatures (related to BA) that arecreated by the BA infrastructure, i.e. the sandbox components 704 andthe like. The signatures can be in the form of MD5 hashes or the like.The server 710 is configured to enforce policy based on configuration,to log transactions to the data store 712 with information includedtherein such as policy reason and Threat category/super categoryinformation, and to send BA content to the BA infrastructure(specifically the server 720). In an embodiment, the server 710 can bethe cloud node 102, etc. That is, the server 710 is generally performinginline traffic processing between a user and another domain mechanismsas a cloud-based system (security-as-a-service).

The server 710 can perform various aspects of inline traffic processingsuch as virus detection and prevention, malware detection andprevention, data leakage prevention, policy enforcement, etc. The focushere is on malware detection and prevention, but it is expected that theserver 710 also provides other security functions. As described herein,malware includes code, scripts, active content, and other software thatis used to disrupt computer operation, gather sensitive information,and/or gain access to private computer systems. That is, malware isactive software installed on a user's device for malicious purposes andcan include executable files (e.g., .EXE), Dynamic Link Libraries (DLL),documents (e.g., .DOCX, .PDF, etc.), etc. The server 710, in conjunctionwith the server 720, can include a set of known malware that is detectedand precluded. However, as malware is constantly evolving, there is aneed to detect quickly (zero-day/zero-hour protection) new malwarefiles. This is the objective of the BA infrastructure—to sandboxpotential files for malware BA and to update the set of known malwarebased thereon.

The operational method 800 starts and determines if a BA policy applies(step 802). The BA policy determines whether or not processing for aparticular user, company, set of users, etc. utilizes the BAinfrastructure. Note, the BA policy does not mean whether or not theserver 710 scans for known malware; rather the BA policy determineswhether the server 710 performs BA on unknown files that could possiblybe malware to detect new malware and add to the list of known malware.If there is no BA policy (step 802), the operational method 800 performsregular traffic processing (step 804). The regular traffic processingcan include the various techniques and processes described herein forsecurity in the cloud, and the operational method 800 stops (step 806).If there is a BA policy (step 802), the operational method 800 checks ifthe content is suspicious (step 808). Content may be suspicious, from amalware perspective, if it is unknown, active software that performssome functionality on the user's device. Determining the content issuspicious can be based on smart filtering that performs a quickanalysis inline in the cloud. If the content is not suspicious (step808), the operational method 800 checks if the content is alreadyclassified by the BA or another system (step 810), and if so, theoperational method 800 makes a log transaction for the content with apolicy reason as BA (step 812). If the content is not already classified(step 810), the operational method 800 performs the regular trafficprocessing (step 804).

If the content is suspicious (step 808), the operational method 800checks whether the policy is to block or not (step 814). Note,suspicious content may or may not be malware; it is the purpose of theBA infrastructure (e.g., the sandbox components 704) to determine this.However, the operational method 800 can allow or block the suspiciouscontent (while also sending the suspicious content to the BAinfrastructure. If the policy is not to block (step 814), theoperational method 800 sends the content to the BA infrastructure (e.g.,the sandbox components 704 for performing the functionality in FIGS. 7and 8 ) (step 816). Next, the operational method 800 performs regulartraffic processing (step 818) (same as step 804), the operational method800 logs the transaction as a policy reason BA allow (step 820), and theoperational method 800 ends (step 822). If the policy is to block (step814), the operational method 800 blocks the content and shows the user ablock page (step 824). The block page notifies the user that the contentwas suspicious and blocked. The operational method 800 sends the contentto the BA infrastructure (e.g., the sandbox components 704 forperforming the functionality in FIGS. 7 and 8 ) (step 826), theoperational method 800 logs the transaction as a policy reason BA block(step 820), and the operational method 800 ends (step 822).

The UI 714 provides the ability to configure policy at the companylevel, or at some set or subset of users, with features that areenabled/disabled using a few checkboxes, for example. The UI 714provides a high-level view of the BA system 700 for a company usingspecific BA reports, and the UI 714 provides the ability to viewanalysis details of any threat from transaction drill-downs. The datastore 712 is configured to store transaction logs from the server 710,to provide counter infrastructure for all BA reports, and to providequerying infrastructure for BA transactions. For example, the data store712 can add a new BA record and handle it in live/sync data paths,perform query module handling for this new BA record, also some newfilters will be added for BA like MD5, perform BA counter handling, andthe like. For example, the counter infrastructure can use the followingdimensions:

Dimension Values MalwareReason One of the following values [Submitted,Benign, Suspicious, Adware, Malware, Anonymizer] Direction One of thefollowing values [Inbound, Outbound] Action Allowed, Blocked

The UI 714 can provide various reports such as a combination of thefollowing filters for drill-down:

Chart Type Drilldown Area BA Actions Blocked BA Actions Quarantined BAActions Sent for Analysis BA Categorization Suspicious Behavior BACategorization Botnet & Malware Behavior BA Categorization AdwareBehavior BA Categorization Anonymizer Behavior

FIG. 7 illustrates an operational method 802 performed by a gatewayelement in the BA infrastructure (e.g., the sandbox components 704),such as the server 720. The server 720 is a critical component in the BAarchitecture that integrates all the other subsystems; it is the centralauthority for all things involved with BA. The server 720 (or gateway tothe BA infrastructure) has the following functional components a SecureStorage Engine (SSE), a Static Analysis Engine (SAE), a Dynamic AnalysisScheduling Engine (DASE), a Database Engine, a Scoring Engine, and aReporting Engine. The SSE is responsible for the persistent storage ofthe BA Content to be analyzed. The results of the analysis is stored inthe SSE, as well. All data related to the customers are stored inencrypted format using symmetric keys (e.g., AES256). The encryptionkeys are generated (well in advance) at regular intervals. Theencryption keys are not stored in SSE. They are retrieved at runtimefrom the Certificate Management Server (currently Central Authority[SMCA] in the sandbox components 704), i.e. they are retrieved atruntime on the server 720 for use. The SSE can store an activity ledgerfor all that has happed for the content which various events recorded,such as what happened to the content? what state is the content in? and,in case of a crash, to continue processing the content from where it wasleft off. Example events can include storing the content, completing astatic analysis of the content, starting a dynamic analysis of thecontent, completing the dynamic analysis of the content, calculating afinal score for the content, and modifying the score of the content. TheSSE can also store results data at various stages of analysis of thecontent, such as Static Analysis Results, JavaScript Object Notation(JSON) data from the Dynamic Analysis, Packet Capture Data, ScreenshotImages, and Files created/deleted/downloaded during the sandboxanalysis.

The BA infrastructure generally uses two techniques to evaluate unknowncontent to detect malware—Static Analysis and Dynamic Analysis—andresults of the two are scored to determine whether or not the unknowncontent is malware. Generally, the Static Analysis looks at variousproperties of the unknown content, whereas the Dynamic Analysis actuallyruns the unknown content. The SAE analyzes the unknown content for knownsignatures (benign or malicious) using a set of tools based on the typeof the file. Some example tools include:

-   -   VirusTotal: Using a Web Application Programming Interface (API),        the MD5 of the unknown content is sent to a third-party service        to check for known viruses as determined by various anti-virus        (AV) engines;    -   YARA tool: Using a Perl Compatible Regular Expressions (PCRE)        engine, the unknown Content is analyzed for known signatures.        The signatures are sourced from various third-party services as        well as internally developed by the operators of the distributed        security system 100;    -   Certificate Analysis: Using Portable Executable (PE)/Common        Object File Format (COFF) specifications, identify the code        signing certificates to form a whitelist of known benign        content; and    -   Zulu (available from zscaler.com): Using the URL Risk Analyzer,        the original URL as well as the IPs and URLs resulting from the        Dynamic Analysis are further analyzed.        Basically, the SAE looks for known attributes that could lead        the unknown content to be malware—such as previously detected        signatures, detecting known malware signatures, analyzing the        source of the unknown content, etc.

The DASE schedules the Dynamic Analysis, which is performed by the BAcontroller 722 and VM server 726. The Dynamic Analysis can be referredto as sandboxing where the unknown content is thrown into a “sandbox,”i.e., the VM server 726, and run to see what happens. The DASE isconfigured to schedule the unknown content within the limitations of theSandboxing Infrastructure (i.e., the BA controller 722 and the VM server726). The DASE can act as queuing manager and scheduler. After staticanalysis, unknown content can be queued based on priority (known virusesget lower priority), availability, and content type. For example, if anunknown content is identified as a Windows executable/DLL it needs to besent to the BA Controller 722 which uses a Windows guest OperatingSystem (OS), if an unknown Content is identified an Android applicationpackage file (APK), it needs to be sent to the BA controller 722 whichuses an Android OS, etc.

The Database Engine is used to maintain a view of data as stored in theSSE. Customer-centric data that requires to be stored in an encryptedformat may not be stored in the database. This is a temporaryarrangement for quicker access to preformatted data for researchpurposes. The database tables can be designed in such a way so as toavoid row updates (as much as possible) during runtime. In case of anyconflicts with the data in the SSE, the SSE can be the authority, andthe view in database can be recreated at any point from the data in theSSE. The Scoring Engine is for analyzing the results using aconfigurable scoring sheet to arrive at a final score for the unknowncontent once all of the Behavioral Analysis is complete. For example,the Scoring Sheet is a file serialized in JSON format that providesindividual scores for various components in the analysis. The ReportingEngine provides a querying interface for the BAUI 724 to display therequired results of the Behavioral Analysis to the user. The results forthe commands can be retrieved from one of the following sources:Information available in memory (cache) score, category, etc.;Information available in disk (SSE), packet captures, screenshots, etc.;Information available in the database Protocol Information (HTTP/SMTP),etc.; and any combination thereof.

The server 720 interfaces to the server 710 (receiving BA content fromthe server 710 and sending BA signatures to the server 710), the BAUI724 (sending BA results to the BAUI 724 and receiving BA requests fromthe BAUI 724), and the BA controller 722 (queuing a Dynamic Analysis bythe BA controller 722 and receiving Dynamic Analysis results from the BAcontroller 722). The operational method 802 starts, such as at startupof the server 720, and waits for new BA content (steps 850, 852). Theoperational method 802 stores new content in the SSE (step 854), andperforms the Static Analysis (SA) (step 856). The operational method 802stores the SA results in the SSE (step 858) and schedules the BA contentfor Dynamic Analysis (DA) with the BA controller 722 (step 860). Theoperational method 802 waits for completion of the DA (steps 862, 864).The operational method 802 receives results of the DA from the BAcontroller 722 (step 866).

Next, the operational method 802 can perform a static analysis for filesystem changes in the DA (step 868). Here, the operational method 802 islooking to see what changes the BA content made when executed or openedin the DA. The operational method 802 stores the DA results in the SSE(step 870). The operational method 802 calculates a final score for theBA content using all results—SA and DA (step 872). The final score canalso be manually be modified if reviewed by operators of the BA system700. The final score is stored in the SSE (step 874), the operationalmethod 802 stores the results view in the database (step 876), and theoperational method 802 ends (step 878).

FIG. 8 illustrates an operational method 804 performed by the BAcontroller 722 in the BA infrastructure (e.g., the sandbox components704). The BA Controller 722 is the engine that controls the sandboxingenvironment. The sandbox is used to execute the BA content in acontrolled VM environment, such as on the VM server 726. It thenevaluates the file system changes, network activity, etc., to analyzethe threat posed by the BA content. The BA controller 722 performs thefollowing functions: Receives BA Content from the server 720 and sendsit for execution (Dynamic Analysis (DA)) with one of the available VMguests on the VM server 726; accumulates all the pertinent results(results in JSON format, packet capture, screenshots, file systemchanges, etc.) from the DA and send them to the server 720; cleans uptemporary files generated; and tracks CPU, network, memory and filesystem usages on the controller for monitoring. Note, the VM server 726can be implemented on the BA controller 722 or in another device.

The operational method 804 starts and waits for BA content (steps 902,904). The operational method 804 schedules received BA content for theDynamic Analysis with a VM (step 906). The operational method 804 waitsfor completion of the DA (steps 908, 910). The operational method 804accumulates results of the DA (e.g., packet capture (PCAP), screenshots,files, JSON, etc.). The operational method 804 sends the DA results tothe server 720 (step 912), and the operational method 804 ends (step916).

The VM server 726 provides a VM infrastructure for use by the BAController 722 for Dynamic Analysis. The VM server 726 can utilizeconventional sandboxing functionality, and can operate all Windows-basedsystems (Windows XP, Windows 7 32/64 bit, Windows 8/8.1 32/64 bit,Windows 10, etc.) as well as Android, iOS, macOS, Linux, etc. The BAUI724 is a web application deployed on a server in the sandbox components704. It can also be deployed on separate hardware. It primarily providesthe following functionality: provides a user interface for the detailedanalysis of a BA Content, and provides a user interface for the SecurityResearch team to manage the various threats.

Dynamic YARA

YARA is the name of a tool primarily used in malware research anddetection that provides a rule-based approach to create descriptions ofmalware families based on textual or binary patterns. A description isessentially a YARA rule name, where these rules include sets of stringsand a Boolean expression. The language used has traits of Perlcompatible regular expressions.

The present disclosure provides an approach to enhance the detectioncapabilities of a cloud sandbox 101. At times there are cases where itis not possible to modify Sandbox signatures due to risk of FalseNegatives. The present disclosure can address the specific FalsePositive (FP) cases. Features of the present disclosure include Malwaredetection efficacy, Malware attribution, Dynamic scoring, Writing a YARArule on unpacked Portable Executable (PE) files, and Dynamic chaining ofcloud sandbox signatures.

The Portable Executable format is a file format for executables, objectcode, DLLs, FON Font files, and others used in 32-bit and 64-bitversions of Windows operating systems. The PE format is a data structurethat encapsulates the information necessary for the Windows OS loader tomanage the wrapped executable code.

The approach described herein includes three components—1) Dynamic YARAengine, 2) Dynamic YARA Python signature, and 3) Dynamic YARA rules.

Dynamic YARA Engine

The Dynamic YARA engine is part of the sandbox 101 and configured togenerate events. Specifically, the Dynamic YARA engine collects data(hereafter referred to as dynamic data) from different sandbox events,with some examples listed in Table 1. These sandbox events providedynamic and static information about the malware samples.

Sandbox Field name in event name Event data used dynamic data Examplestaticgen File extension (Extracted staticgen:filetype:staticgen:filetype:exe value) sigid ID of sandbox signatures sigid:sigid:767 hits filedumps Path of all dropped files. filedump:pathfiledump:path:C:\test.txt windows Title and Text of window:title:window:title:Setup windows created window:text: window:text:This isinstaller dnsQuery dnsQuery name dnsquery:name: dnsquery:name:google.comstaticOLEEntry vbacodedeobfuscat ed staticoleentry:vbacostaticoleentry:vbacodedeobfus data from Macro. dedeobfuscated: cated:DimVBAMacro_code processcreated:path: processcreated:path:C:\windows\mal.exe processCreated Path and command line processcreated:cmdline:processcreated:cmdline:C:\windows\ value of all createdsystem32\cmd.exe/c dir processes. memstrings Memory stringsmemstring:string: memstring:string:Y..y.Hc.H..H.|$8f.\ $0D.\$ L.T$......H. mutantCreated Mutex name: mutantcreated:Namemutantcreated:name:_!SHMS FTHISTORY!_ http,littps,httpData Header dataand http:header: it includes both http and extracted other http https.http:header:POST/ request fields 59C9AEA632140C63AFA3D7318940E42C6CA421A9C1 HTTP/1.1 Accept: */* Content- Type:application/x-www-form- urlenco ded User-Agent: Mozilla/5.0 Windows NT6.1; WOW64; rv:25.0 Gecko/20100101 http:rawdata: Data sent or receivedin http request ( converted to hex) http:rawdata:80000000302460cac85371cd92dcdf526430fd3b 8f9c2a3f7a5d39af41 keyValueCreated Newregistry key Path, keyvaluecreated:path:keyvaluecreated:path:HKEY_USERS\ Software\Microsoft\Off ice\ 12.0\Wordkeyvaluecreated:name: keyvaluecreated:name:MTTT name and newdatakeyvaluecreated:newdata: New data added to registry key. created Datacan be ASCII/Unicode and binary (represented as hex string)keyvaluecreated:newdata:A40 30000C0CFA8B29444D30100 000000keyvaluecreatedlnewdata:C:\ Windows\10923484211833438\ winfunx.exekevValueModified Modified registry key keyvaluemodified:path:keyvaluemodified:path:HKEY_USERS\ path, name and newdataSoftware\Microsoft\Off keyvaluemodified:name:keyvaluemodified:name:ReviewToken keyvaluemodified:newdata: Modifieddata added to registry key. Data can be ASCII and binary (represented ashex string) keyvaluemodified:newdata:W HTMLControlEventskeyvaluemodified:newdata:0F 00000001000000140000007175753454C2982E84ED48F5B 4EE5248 memWritten, Details of memorymemwritten:value: Memory data written/modified memAlloc, area modifiedin the or extracted PE file content - memProtect, all monitored hexstring - memdumps processes. memwritten:value:4D5A90000300000004000000FFFF0000B8 mernwritten:valuele n: Total length of memorymemwritten:valuelen:4045 memwritten:valuele nA: Length of data writtenmemwritten:valuelenA:400Dynamic Data

Dynamic data is a collection or dump of all the data received fromevents, such as those mentioned in Table 1. For most of theevents—(staticOLEEntry, HTTP, HTTPS, HTTP data, keyValueCreated,keyValueModified, memWritten, memAlloc, memProtect, memdumps) data isnormalized/pre-processed before storing it in a dynamic data buffer.

The collected data can be stored in a special format. Different fieldnames (derived from sandbox event names and value field names) cane usedto represent event data, for example—staticgen:filetype: , sigid: etc.This helps in writing a YARA rule on the exact event data and to avoidFalse Positives. For example, the following YARA rule triggers if string“windows” is found in mutex (mutual exclusion) data only. Without“mutantcreated:name:” field, it could cause FP since “windows” stringcan be found in data of other events as well.

  rule Win32_Testing_Rule1 : knownmalwareDS { strings:$strl=”mutantcreated:name: windows” condition: all of them }Unpacked PE File Extraction

Dynamic data also includes the content of unpacked PE files. Unpacked PEfile data can be provided in a memwritten:value: field. This event alsoprovides data written to other processes, such as using a WindowsAPI—WriteProcessMemory and NtWriteProcessMemory. A PE file extractionmethod can extract an unpacked PE file for malware using remote processinjection, process hollowing, or self-injection unpacking methods.

For extracting unpacked PE files from malware that uses processinjection or process hollowing techniques, the Dynamic YARA engine usesthe “memWritten” event of the sandbox 101. This event provides datawritten to any process memory. The Dynamic YARA engine only extractsmemory data that has been written on other process memory area usingWriteProcessMemory and NtWriteProcessMemory Windows APIs. If MZ stringis found, full memory data will be added to dynamic data, else onlyfirst 746 bytes will be added. As is known in the art, an MZ string isan indication of an executable file in Windows.

For extracting unpacking files from memory dumps and self-injectionunpacking, the Dynamic YARA engine can listen for “memAlloc,”“memProtect,” and “memdumps” sandbox events. “memAlloc” and “memProtect”events provide details about virtual memory modifications done bymalware during execution in the sandbox 101 and “memdumps” providememory dump files.

The following method can be used to extract an unpacked PE file:

-   -   1) Store virtual memory base address and length if virtual        memory is allocated or virtual memory protection is changed        using “VirtualAlloc,” “VirtualProtect,”        “NtAllocateVirtualMemory,” “NtProtectVirtualMemory.” Windows        APIs, respectively.    -   2) For each virtual memory base address, locate the        corresponding memory dump file.    -   3) For each memory dump file found in step 2), read the first        two bytes from offset zero and match with the “MZ” marker.    -   4) If the “MZ” marker found extract PE file using base address        and length values collected in step 1).

To avoid duplication, the MD5 of all extracted PE files is stored andcompared to determine if the PE file has been analyzed already.

Dynamic Data File and YARA Scanning

The dynamic data is stored in a file for scanning. The location of thedynamic data file is mentioned in a config.properties file (Table 2). Inan embodiment, the maximum size limit for the dynamic data file is 100MB.

The dynamic data file is scanned using a YARA command-line tool. Forexample, here is a syntax of the command—

-   -   yara -f -g -s<dynamic_rule_file><dynamic_data_file>        Dynamic YARA Configuration Options

The following new configuration options are added for the Dynamic YARAengine. These config options are defined in config.properties file.

TABLE 2 Dynamic yara engine config options Config option Default valueComment MAX_ZSYARABUFFERSIZEINMB 100 MB Max dynamic YARA dataMAX_ZSYARAPEDUMPSIZEINMB 100 MB Max memdump file size.MAX_ZSYARAMEMWRITTENDUMPSIZEINMB 100 MB Max extracted PE file size.ZSYARARULEPATH — Dynamic YARA rules file. ZSYARADATAFOLDER —Dynamic YARA data file location. ZSYARADEBUG — Debug flag, True todisable deletion Dynamic YARA Python Signature

A new Python signature is used in the Dynamic YARA approach. This newsignature can merge Known Clean File detection and Known Malicious Filedetection Python signatures.

This new Python signature listens for “zsyarahit” and “sighits” events.“zsyarahit” provides details of dynamic YARA rules that hit on thedynamic data file. This signature decides about the dynamic YARA rulescore based on the rule tag (discussed in the following section). Italso collects contextual information about YARA rule hits. Thisinformation is shown in a BA UI report along with dynamic YARA rulenames (FIG. 9 ). The maximum limit for contextual information can be 250characters.

Dynamic YARA Rules

The Dynamic YARA rules can be ordinary YARA rules. A new YARAfile—dynamic_ba_yara.yara—can hold the YARA rules.

The Dynamic YARA rules can use specific tag names. These tags controlthe type and score of the rule. There can be two types of dynamic YARAtags—knownclean and knowmalware, each with a specific score

A knownclean tag is used for clean samples; it marks any sample asclean, regardless of the DA score. This can be done using special score“−127”.

Here is an example YARA rule for knownclean:

rule Gen_Installer : knownclean {  strings:  $sig_id=“sigid:11004”//Known malicious MD5  $dropped_file_count=“filedump:path:”  $pattern=/window:title:.{0,100}(setup|install|wizard).{0,100}\nwindow:t  ext:.   {0,1000}(next|back|close|exit|decline|accept|cancel)/ nocase condition:  #dropped_file_count>3 and $pattern and not $sig_id }

knownmalware tag have the following sub-tags, all these tags are used todetect malware. These tags also specify the score for the rule (Table3).

TABLE 3 Dynamic YARA rule tags, order by priority Priority Tag nameScore 1 knownclean −127 2 knownmalwareDS Dynamic score 3 knownmalware127 4 knownmalware40 40 5 knownmalware20 20 6 knownmalware10 10 7knownmalware0 0

The knownmalware tag is to mark any sample as malware (using specialscore 127) regardless of the DA score. Since there are more granularscoring tags, knownmalware tag is generally not used. All the otherknownmalware tags add 40, 20, 10, or 0 scores to a DA score. In case ofmultiple dynamic YARA hits, priority mentioned in Table 3 is used, andthe final score is added to DA.

The knownmalwareDS tag is a special tag that is used for dynamicscoring. This is used when there is a desire to adjust the score of aYARA rule automatically based on the DA score. The Dynamic YARA ruleusing this tag will always mark the sample as malware but add only therequired score to DA. It can use the following method to decide thescore (Table 4)—

TABLE 4 Dynamic score mapping DA Score YARA rule score 120 and above  0100-110  10  80-90  20  50-70  40  40 and less 127

So, this helps in using the same dynamic YARA rule for attribution anddetection. The knownmalwareDS tag is very useful for malware thoseanti-sandbox techniques or any downloader that was not able to downloadthe payload.

Dynamic YARA Process

FIG. 10 is a flowchart of a process 950 for dynamic rules in acloud-based sandbox. The process 950 can be a computer-implementedmethod, embodied as instructions in a non-transitory computer-readablemedium, and implemented via the server 200. The process 950 includesreceiving unknown content in a cloud-based sandbox (step 952);performing an analysis of the unknown content in the cloud-basedsandbox, to obtain a score to determine whether or not the unknowncontent is malware (step 954); obtaining events based on the analysis(step 956); running one or more rules on the events (step 958); andadjusting the score based on a result of the one or more rules (step960). The process 950 can further include classifying the unknowncontent as malware or clean based on the adjusted score.

The analysis can include a static analysis and a dynamic analysis. Theevents are generated during the static analysis and the dynamicanalysis. The events can include any of file extension, signature hits,paths, title and text of windows created, DNS query names, processescreated, memory information, mutex names, HTTP data, and registryinformation. The events can be processed and stored in a dynamic databuffer in a specific format, for processing by the one or more rules.The events can include content of unpacked files determined to beexecutable files. The adjusting can include a dynamic score for the oneor more rules based on the score from the analysis.

It will be appreciated that some embodiments described herein mayinclude or utilize one or more generic or specialized processors (“oneor more processors”) such as microprocessors; Central Processing Units(CPUs); Digital Signal Processors (DSPs): customized processors such asNetwork Processors (NPs) or Network Processing Units (NPUs), GraphicsProcessing Units (GPUs), or the like; Field-Programmable Gate Arrays(FPGAs); and the like along with unique stored program instructions(including both software and firmware) for control thereof to implement,in conjunction with certain non-processor circuits, some, most, or allof the functions of the methods and/or systems described herein.Alternatively, some or all functions may be implemented by a statemachine that has no stored program instructions, or in one or moreApplication-Specific Integrated Circuits (ASICs), in which each functionor some combinations of certain of the functions are implemented ascustom logic or circuitry. Of course, a combination of theaforementioned approaches may be used. For some of the embodimentsdescribed herein, a corresponding device in hardware and optionally withsoftware, firmware, and a combination thereof can be referred to as“circuitry configured to,” “logic configured to,” etc. perform a set ofoperations, steps, methods, processes, algorithms, functions,techniques, etc. on digital and/or analog signals as described hereinfor the various embodiments.

Moreover, some embodiments may include a non-transitorycomputer-readable medium having instructions stored thereon forprogramming a computer, server, appliance, device, processor, circuit,etc. to perform functions as described and claimed herein. Examples ofsuch non-transitory computer-readable medium include, but are notlimited to, a hard disk, an optical storage device, a magnetic storagedevice, a Read-Only Memory (ROM), a Programmable ROM (PROM), an ErasablePROM (EPROM), an Electrically EPROM (EEPROM), Flash memory, and thelike. When stored in the non-transitory computer-readable medium,software can include instructions executable by a processor or device(e.g., any type of programmable circuitry or logic) that, in response tosuch execution, cause a processor or the device to perform a set ofoperations, steps, methods, processes, algorithms, functions,techniques, etc. as described herein for the various embodiments.

Although the present disclosure has been illustrated and describedherein with reference to preferred embodiments and specific examplesthereof, it will be readily apparent to those of ordinary skill in theart that other embodiments and examples may perform similar functionsand/or achieve like results. All such equivalent embodiments andexamples are within the spirit and scope of the present disclosure, arecontemplated thereby, and are intended to be covered by the followingclaims.

What is claimed is:
 1. A non-transitory computer-readable medium havinginstructions stored thereon for programming a cloud-based sandboxcomprising one or more processor to perform steps of: receiving unknowncontent in the cloud-based sandbox that is located inline betweendevices associated with the unknown content; performing an analysis ofthe unknown content in the cloud-based sandbox, to obtain a score todetermine whether or not the unknown content is malware; allowing theunknown content from the cloud-based sandbox responsive to adetermination of the analysis that the unknown content is not malware;responsive to the determination of the analysis that the unknown contentis malware, obtaining events based on the analysis; running one or morerules on the events; and adjusting the score based on a result of theone or more rules, classifying the unknown content as malware or cleanbased on the adjusted score, and allowing or blocking the unknowncontent based on the classifying.
 2. The non-transitorycomputer-readable medium of claim 1, wherein the analysis includes astatic analysis and a dynamic analysis.
 3. The non-transitorycomputer-readable medium of claim 2, wherein the events are generatedduring the static analysis and the dynamic analysis.
 4. Thenon-transitory computer-readable medium of claim 3, wherein the eventsinclude any of file extension, signature hits, paths, title and text ofwindows created, DNS query names, processes created, memory information,mutex names, HTTP data, and registry information.
 5. The non-transitorycomputer-readable medium of claim 3, wherein the events are processedand stored in a dynamic data buffer in a specific format, for processingby the one or more rules.
 6. The non-transitory computer-readable mediumof claim 3, wherein the events include content of unpacked filesdetermined to be executable files.
 7. The non-transitorycomputer-readable medium of claim 3, wherein the adjusting includes adynamic score for the one or more rules based on the score from theanalysis.
 8. The non-transitory computer-readable medium of claim 1,wherein a rule of the rules includes sets of strings and Booleanexpressions, and wherein the rule is to address a determined FalsePositive score.
 9. The non-transitory computer-readable medium of claim1, wherein a rule of the rules includes a tag and a special score thatis used in the adjusting the score either towards being clean ormalware.
 10. The non-transitory computer-readable medium of claim 9,wherein the rule is for detecting anti-sandbox techniques and thespecial score for the rule is set to mark any unknown content whichtriggers the rule as malware.
 11. An apparatus comprising: a networkinterface; a data store; a processor communicatively coupled to thenetwork interface and the data store; memory storing instructions that,when executed, cause the processor to: receive unknown content in acloud-based sandbox that is located inline between devices associatedwith the unknown content; perform an analysis of the unknown content inthe cloud-based sandbox, to obtain a score to determine whether or notthe unknown content is malware; allow the unknown content from thecloud-based sandbox responsive to a determination of the analysis thatthe unknown content is not malware; response to the determination of theanalysis that the unknown content is malware, obtain events based on theanalysis, run one or more rules on the events; and adjust the scorebased on a result of the one or more rules, classify the unknown contentas malware or clean based on the adjusted score, and allow or block theunknown content based on the classifying.
 12. The apparatus of claim 11,wherein the analysis includes a static analysis and a dynamic analysis.13. The apparatus of claim 12, wherein the events are generated duringthe static analysis and the dynamic analysis.
 14. The apparatus of claim13, wherein the events include any of file extension, signature hits,paths, title and text of windows created, DNS query names, processescreated, memory information, mutex names, HTTP data, and registryinformation.
 15. The apparatus of claim 13, wherein the events areprocessed and stored in a dynamic data buffer in a specific format, forprocessing by the one or more rules.
 16. The apparatus of claim 13,wherein the events include content of unpacked files determined to beexecutable files.
 17. A computer-implemented method comprising:receiving unknown content in a cloud-based sandbox that is locatedinline between devices associated with the unknown content; performingan analysis of the unknown content in the cloud-based sandbox, to obtaina score to determine whether or not the unknown content is malware;allowing the unknown content from the cloud-based sandbox responsive toa determination of the analysis that the unknown content is not malware;response to the determination of the analysis that the unknown contentis malware, obtaining events based on the analysis; running one or morerules on the events; and adjusting the score based on a result of theone or more rules, classifying the unknown content as malware or cleanbased on the adjusted score, and allowing or blocking the unknowncontent based on the classifying.
 18. The computer-implemented method ofclaim 17, wherein the analysis includes a static analysis and a dynamicanalysis.
 19. The computer-implemented method of claim 18, wherein theevents are generated during the static analysis and the dynamicanalysis.
 20. The computer-implemented method of claim 19, wherein theevents include any of file extension, signature hits, paths, title andtext of windows created, DNS query names, processes created, memoryinformation, mutex names, HTTP data, and registry information.